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It is shown that polar codes achieve the symmetric capacity of discrete memoryless channels with 
L^ arbitrary input alphabet sizes. It is shown that in general, channel polarization happens in several, rather 

than only two levels so that the synthesized channels are either useless, perfect or "partially perfect". 
, ^, Any subset of the channel input alphabet which is closed under addition, induces a coset partition of 

^_ the alphabet through its shifts. For any such partition of the input alphabet, there exists a corresponding 

> partially perfect channel whose outputs untquely determine the coset to which the channel input belongs. 

C^ By a slight modification of the encoding and decoding rules, it is shown that perfect transmission of 

^__l certain information symbols over partially perfect channels is possible. Our result is general regarding 

t^^ both the cardinality and the algebraic structure of the channel input alphabet; i.e we show that for any 

o 

,—1 channel input alphabet size and any Abelian group structure on the alphabet, polar codes are optimal. It 



X 



is also shown through an example that polar codes when considered as group/coset codes, do not achieve 
the capacity achievable using coset codes over arbitrary channels. 

Index Terms 

Polar codes, Channel polarization. Group codes. Discrete memoryless channels 

I. Introduction 

Polar codes were originally proposed by Arikan in |1| for discrete memoryless channels with a binary 
input alphabet. Polar codes over binary input chamiels are shifted linear (coset) codes capable of achieving 
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the symmetric capacity of channels. These codes are constructed based on the Kronecker power of the 2x2 
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and are the first known class of capacity achieving codes with an explicit construction. 
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It is known that non-binary codes outperform binary codes in certain communication settings. Therefore, 
constructing capacity achieving codes for channels of arbitrary input alphabet sizes is of great interest. 
In order to construct capacity achieving codes over non-binary channels, there have been attempts to 
extend polar coding techniques to channels of arbitrary input alphabet sizes. It is shown in Q that polar 
codes achieve the symmetric capacity of channels when the size of the input alphabet is a prime. For 
channels of arbitrary input alphabet sizes, it is shown in Q that the original construction of polar codes 
does not necessarily achieve the symmetric capacity of the channel due to the fact that polarization (into 
two levels) may not occur for arbitrary channels. In the same paper, a randomized construction of polar 
codes based on permutations is proposed. In this approach, the existence of a polarizing transformation 
is shown by a (small) random coding argument over the ensemble of permutations of the input alphabet. 
In another approach in Q, a code construction method is proposed which is based on the decomposition 
of the composite input channel into sub-channels of prime input alphabet sizes. In this multilevel code 
construction method, a separate polar code is designed for each sub-channel of prime input alphabet 
size. It is shown in |[3| that for channels for which the input alphabet size is a prime power, polar codes 
defined on the input alphabet can achieve the symmetric capacity without the need to use multilevel code 
construction methods. 

Another related work is l^l, in which the authors have shown that polar codes are sufficient to achieve 
the uniform sum rate on any binary input MAC and it is stated that the same technique can be used 
for the point-to-point problem to achieve the symmetric capacity of the channel when the size of the 
alphabet is a power of 2. In a recent work, it has been shown in Q that polar codes achieve the capacity 
of channels with input alphabet size a power of 2. 

In this paper, we show that with a slight modification of the encoding and decoding rules, standard 
polar codes are sufficient to achieve the symmetric capacity of all discrete memoryless channels. Our 
result is general regarding both the cardinality and the algebraic structure of the channel input alphabet; 
i.e we show that for any channel input alphabet size and any Abelian group structure on the alphabet, 
polar codes are optimal. This result was first reported in |]6J. We use a combination of algebraic and 



coding techniques and show that in general, channel polarization occurs in several levels rather than only 
two: Suppose the channel input alphabet is G and is endowed with an Abelian group structure. Then 
for any subset H of the channel input alphabet G which is closed under addition (i.e any subgroup of 
G), there may exist a corresponding polarized channel which can perfectly transmit the index of the 
shift (coset) of i7 in G which contains the input. As an example, for a channel of input Zg, there are 
four subgroups of the input alphabet: i) {0} with cosets {0}, {!}, {2}, {3}, {4} and {5}, ii) {0, 3} with 
cosets {0,3}, {1,4} and {2,5}, iii) {0,2,4} with cosets {0,2,4} and {1,3,5} and iv) Zq. For polar 
codes over Zg, the asymptotic synthesized channels can exist in four forms: i) can determine which one 
of the cosets {0}, {1}, {2}, {3}, {4} or {5} contains the input symbol, (perfect channels with capacity 
log2 6 bits per channel use), ii) can determine which one of the cosets {0, 3}, {1, 4} or {2, 5} contains 
the input symbol (partially perfect channels with capacity log2 3 bits per channel use), iii) can determine 
which one of the cosets {0,2,4} or {1,3,5} contains the input symbol (partially perfect channels with 
capacity 1 bit per channel use), iv) can only determine the input belongs to {0,1,2,3,4,5} (useless 
channel). Cases i,ii,iii and iv correspond to coset decompositions of Zg based on subgroups {0}, {0, 3}, 
{0, 2, 4} and {0, 1, 2, 3, 4, 5} respectively. 

Although standard binary polar codes are group (linear) codes, the class of capacity achieving codes 
constructed and analyzed in this paper are not group codes. It is known that group codes do not generally 
achieve the symmetric capacity of discrete memory less channels p|. Hence, one could have predicted 
that standard polar codes cannot achieve the symmetric capacity of arbitrary channels and a modification 
of the encoding rule is indeed necessary to achieve that goal. Due to the modifications we make to the 
encoding rule of polar codes, the constructed codes fall into a larger class of structured codes called 
nested group codes. 

The paper is organized as follows: In Section [Ilj some definitions and basic facts are stated which are 
used in the paper. In Section |lllj we present two motivating examples of 4-ary and 6-ary channels and 
observe the polarization effect on these channels. In Section [IVJ we show that polar codes achieve the 
symmetric capacity of channels with input alphabet size q = p^ where p is a prime and r is an integer. 
This result is generalized to arbitrary channels in Section |V] In Section |VIj the relation of polar codes to 
group codes is discussed and two examples of channels over Z4 are provided. In the first example, we 
show that polar codes approach the capacity of channels achievable using group codes. The intent of the 
second example is to show that this is not generally the case; i.e. polar codes do not generally approach 



the capacity of channels achievable using group/coset codes. 



II. Preliminaries 

1) Source and Channel Models: We consider discrete memoryless and stationary channels used without 
feedback. We associate two finite sets X and y with the channel as the channel input and output alphabets. 
These channels can be characterized by a conditional probability law W{y\x) iov x ^ X and y G 3^. 
The channel is specified by {X,y,W). The source of information generates messages over the set 
{1, 2, ... , M} uniformly for some positive integer M. 

2) Achievability and Capacity: A transmission system with parameters (n, M, r) for reliable commu- 
nication over a given channel {X, y, W) consists of an encoding mapping e : {1, 2, . . . , M} — )• Af" and 
a decoding mapping d : y"' ^ {1,2, . . . , M} such that 



1 ^ 

- J]] VF" ((i(y") / m|X" = e(m)) < r 



M 

m=l 

Given a channel {X, y, W), the rate R is said to be achievable if for all e > and for all sufficiently 
large n, there exists a transmission system for reliable communication with parameters (n, M, r) such 
that 

- log M > i? - e, r < e 

n 



3) Symmetric Capacity and the Bhattacharyya Parameter: For a channel (X,y,W), the symmetric 
capacity is defined as I^{W) = I{X; Y) where the channel input X is uniformly distributed over X and 
Y is the output of the channel; i.e. fov q = \X\, 



^°w = EE-^(y|^)i°g— ^ 



W{y\x) 



x&Xyay^ ^-W{y\x) 

^-^ q 

The Bhattacharyya distance between two distinct input symbols x and x is defined as 



and the average Bhattacharyya distance is defined as 



4) Binary Polar Codes: For any A^ = 2", a polar code of lengtli N designed for tlie channel {1^2, y, W) 
is a linear code characterized by a generator matrix Gn and a set of indices A C { 1 , • • • , N} of perfect 

channels. The generator matrix for polar codes is defined as Gn = B^F®^ where B^ is a permutation 

1 
of rows, F = and (g) denotes the Kronecker product. The set yl is a function of the channel. 

1 1 
The decoding algorithm for polar codes is a specific form of successive cancellation |[T|. 

5) Groups, Rings and Fields: All groups referred to in this paper are Abelian groups. Given a group 
(G,+), a subset i7 of G is called a subgroup of G if it is closed under the group operation. In this 
case, {H, +) is a group on its own right. This is denoted by i7 < G. A coset C of a subgroup H is a 
shift of H by an arbitrary element a £ G (i.e. G = a + H ioi some a € G). For any subgroup H of 
G, its cosets partition the group G. A transversal T of a subgroup H oi G is a subset of G containing 
one and only one element from each coset (shift) of H. 

We give some examples in the following: The simplest non-trivial example of groups is Z2 with addition 
mod-2 which is a ring and a field with multiplication mod-2. The group Z2 x Z2 is also a ring and a 
field under component-wise mod-2 addition and a carefully defined multiplication. The group Z4 with 
mod-4 addition and multiplication is a ring but not a field since the element 2 G Z4 does not have a 
multiplicative inverse. The subset {0, 2} is a subgroup of Z4 since it is closed under mod-4 addition. 
{0} and Z4 are the two other subgroups of Z4. The group Ze is neither a field nor a ring. Subgroups of 
Ze are: {0}, {0,3}, {0,2,4} and Zg. 

6) Polar Codes Over Abelian Groups: For any discrete memoryless channel, there always exists 
an Abelian group of the same size as that of the channel input alphabet. In general, for an Abelian 
group, there may not exist a multiplication operation. Since polar encoders are characterized by a matrix 
multiplication, before using these codes for channels of arbitrary input alphabet sizes, a generator matrix 
for codes over Abelian groups needs to be properly defined. In Appendix [Aj a convention is introduced 
to generate codes over groups using {0, l}-valued generator matrices. 



7) Group Codes: Let the channel input alphabet X be equipped with the structure of a finite Abelian 
group G of the same size. Then the channel is specified by (G, 3^, VF). A group code over G of length 
N for this channel is any subgroup of G^. The group capacity of a channel (G, 3^, W) is the maximum 
achievable rate using group codes over G for this channel. Group codes generalize the notion of linear 
codes over fields to channels with composite input alphabet sizes. A coset code is a shift of a group code 
by a constant vector. 

8) 'Notation: We denote by 0(e) any function of e which is right-continuous around and that 
0(e) — ;■ as e 4, 0. We denote by a ss^ ^ to mean a = 6 + 0(e). 

For positive integers N and r, let {^loi ^ii • " " ) ^r} be a partition of the index set {1, 2, • • • , A^}. Given 
sets Tt for t = 0, • • • , r, the direct sum 0[=o ^i ' ^^ defined as the set of all tuples u^ = (ui, • • • , ujv) 
such that Ui G Tt whenever i ^ At. 



III. Motivating Examples 

A key property of the basic polarizing transforms used for binary polar codes is that they have perfect 
and useless channels as their "fixed points"; in the sense that, if these transforms are applied to a perfect 
(useless) channel, the resulting channel is also perfect (useless). In the following, we try to demonstrate 
that for non-binary channels, the basic transforms have fixed points which are neither perfect nor useless. 
Consider a 4-ary channel {'E^^y ,W) and assume the channel is such that W{y\u) = W{y\u + 2) for 
all y E 3^ and all u G Z4; i.e. the channel cannot distinguish between inputs u and n + 2. Consider the 
transformed channels W~ and W^ originally introduced in yj (Refer to Equations Q and (|6]) of the 
current paper). It turns out that 

Ty+(yi,y2,-ui|u2) = W^{yi,y2,ui\u2 + 2) 
W~{yi,y2\ui) = W~{yi,y2\ui + 2) 

for all yi,y2 ^ y and all ui,U2 G ^^4. This observation is closely related to the fact that {0, 2} is closed 
under addition mod-4; i.e. the fact that {0, 2} forms a subgroup of Z4. This means that the transformed 
channels inherit this characteristic feature of the original channel, in the sense that they cannot distinguish 
between inputs Uj and Ui + 2 {i = 2 for W^ and i = 1 for W). This suggests that even in the asymptotic 
regime, the transformed channels can only distinguish between the sets {0, 2} and {1,3}, and not within 



each set. In the following, we give an example for which such cases indeed exist in the asymptotic regime. 

Consider the channel depicted in Figure [T] For this channel, the symmetric capacity is equal to C = 
I{X; Y) = 2 — e — 2A. Depending on the values of the parameters e and A, this channel can present three 
extreme cases: 1) If A = 1, this channel is useless. 2) If e = 1, this channel cannot distinguish between 
inputs u and u + 2 and has a capacity of 1 bit per channel use. 3) If e = A = 0, this channel is perfect 
and has a capacity of 2 bits per channel use. 




Fig. 1: Channel 1: The input of the channel has the structure of the group Z4. The parameters e and A take values 
from [0, 1] such that e + X < 1. Ei and E2 are erasures connected to cosets of the subgroup {0, 2}. The lines 
connecting the output symbols 0, 2, 1, 3 to their corresponding inputs, represent a conditional probability of 1 — e — A. 
For this channel, the process I(W^^^^"'''") can be explicitly found for each n and the multilevel polarization can 
be observed. 

Given a sequence of bits 6162 • • • 6„, define ly^i^^-^" as in |[l] Section IV], and let /(ly^i^^-^") be the 
mutual information between the input and output of W^^^'^'"^"^ when the input is uniformly distributed. 
We can find I{W^^^^'"^^) using the following recursion for which the proof can be found in Appendix 

m 

Define eo = e and Aq = A. For i = 1, • • • , n, 
. If 6i = 1, let 

ei = ef_i + 2ei_iAj_i 
Aj = \_i 
If bi = 0, let 

Q = 2ej_i — (e^_i + 2ej_iAj_i) 



(2) 



Ai — 2Ai_i — A- 



Then we have /(ly^i^^-^-) = 2 - e„ - 2A„. 

Consider the function / : [0,1]^ — ^ [0,1]^, /(f?^) = (^^ + 2eA,A^) corresponding to Equation ([TJ. 
The fixed points of this function are given by (0, 1), (1,0) and (0,0). Similarly, consider the function 
g : [0, if -> [0, 1]^, g{e,X) = (2e - (e^ + 2eA),2A - A^) corresponding to Equation Q. It turns out 
that the fixed points of g are the same as those of /. This suggests that in the limit, the transformed 
channels converge to one of three extreme cases discussed above. Figures |2] and [3] show that it is indeed 
the case and depicts the three level polarization of the mutual information process /(VF^i''^"^") to a 
discrete random variable /°° as n grows. 
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Fig. 2: The behavior of /(VK^'i^^- '>„) foj. n ^ U for Channel 1 when e = 0.4 and A = 0.2. The three soUd lines 
represent the three discrete values of /°° with positive probability. 



When N = 2" is large, let A^o be the number of useless channels (corresponding to the width of the 
first step in Figure [3]l, A'^i be the number of partially perfect channels (corresponding to the width of the 
second step in Figure [3]l and N2 be the number of perfect channels (corresponding to the width of the 
third step in Figure [3]|. Since the mutual information process is a martingale, it follows that 

C = E{/°°} 



— X H X 1 H X 2 

N N N 



where C is the symmetric capacity of the channel. Consider the following encoding rule: For indices 
corresponding to useless channels, let the input symbol take values from {0} (from the transversal of 
the subgroup Z4 of Z4 i.e. fix the input). For indices corresponding to partially perfect channels, let 
the input symbol take values from {0,1} (from the transversal of the subgroup {0,2} of Z4). For 
indices corresponding to perfect channels, let the input symbol take values from Z4 (choose information 
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Fig. 3: The asymptotic behavior of /(W^i^'^-''™), A^ = 2" = 2"*, 2^, 2^^^ 2" for Channel 1 when the data is sorted. 
We observe that for this channel, all three extreme cases appear with positive probability. In general, it is possible 
to have fewer cases in the asymptotic regime. 



symbols from the transversal of the subgroup {0} of Z4). It turns out that this encoding rule used with 
an appropriate decoding rule has a vanishingly small probability of error as N becomes large. The rate 
of this code is equal to 

1 



R 



N 



{No log2 1 + iVi log2 2 + N2 log2 4) 



This means R = C is achievable using polar codes. 



Next, we consider a channel with a composite input alphabet size. Consider the channel depicted in 
Figure |4] We call this Channel 2. It turns out that given a sequence of bits 6162 ••• fen, the transformed 
channel w''^''^'"''" is (equivalent to) a channel of the same type as Channel 2 but with possibly different 
parameters e, A and 7. At each step n, the corresponding parameters can be found using the following 
recursion: Define eo = e, Aq = A and 70 = 7. For i = 1, • • • , n, 

. If bi = 1, let 



< 



7i 
A,; 



Cj-i + 2ei_iAi_i 



(3) 



\2 
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Chamicl Inputs { 



Channel Outputs < 




Fig. 4: Channel 2: A channel with a composite input alphabet size. For this channel, the process I" can be explicitly 
found for each n and the multilevel polarization can be observed. Ei, E2 and £'3 are erasures corresponding to 
cosets of the subgroup {0,3} and E4 and £'5 are erasures corresponding to cosets of the subgroup {0,2,4}. The 
lines connected to outputs Ei , E2 and £3 correspond to a conditional probability of 7, the lines connected to 
outputs E4 and £5 correspond to a conditional probability of e, the lines connected to the output Eg correspond 
to a conditional probability of A, and the lines connected to outputs 0, 1, 2,3,4 and 5 correspond to a conditional 
probability of 1 — 7 — e — A. The parameters 7, e, A take values from [0, 1] such that 7 + e + A < 1. 



If bi = 0, let 



Then we have 



I{W 



bib2---b„ 



-fi = 27j_i - (jf_^ + 27i„iAj_i) 
Q = 2ej_i — (ej„i + 2ei_iAi_i) 
Aj = 2Ai_i — (Aj_i) 

= log2 6 - 7„ log2 2 - e„ log2 3 - A„ logg 6 



(4) 



The proof of the recursion formulas for Channel 2 is similar to that of Channel 1 and is omitted. The fixed 
points of the functions corresponding to Equations ([3]) and Q are given by (0,0,0), (1,0,0), (0, 1,0), 
(1,1,0), (0,0,1), (-1,0,1), (0,-1,1) and (-1,-1,1), out of which (0,0,0), (1,0,0), (0,1,0) and 
(0, 0, 1) are admissible. Note that (0, 0, 0) corresponds to a perfect channel with a capacity of log2 6 bits 
per channel use, (1, 0, 0) corresponds to a partially perfect channel which can perfectly send the index of 
the coset of the subgroup {0,3} to which the input belongs and has a capacity of log2 3 bits per channel 
use, (0, 1,0) corresponds to a partially perfect channel which can perfectly send the index of the coset 
of the subgroup {0,2,4} to which the input belongs and has a capacity of log2 2 bits per channel use, 
and (0,0, 1) corresponds to a useless channel. This suggests that in the limit, the transformed channels 
converge to one of these four extreme cases. This can be confirmed using the recursion formulas for this 



11 



channel as depicted in Figures |5] and [6] With encoding and decoding rules similar to those of Channel 
1, we can show that polar codes achieve the symmetric capacity of this channel. 
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Fig. 5: Polarization of Channel 2 with parameters 7 = Fig. 6: Polarization of Channel 2 with parameters 7 — 
0, e = 0.4, A — 0.2. The middle line represents the 0.4, e = 0, A = 0.2. The middle line represents the 
subgroup {0,2,4} of Zg. subgroup {0,3} of Zg. 

In the next section, we show that polar codes achieve the symmetric capacity of channels with input 
alphabet size equal to a power of a prime. 

IV. Polar Codes Over Channels with input Zpr 

In this section, we consider channels of input alphabet size q = p^ for some prime number p and 
a positive integer r. In this case, the input alphabet of the channel can be considered as a ring with 
addition and multiplication modulo p^. We prove the achievability of the symmetric capacity of these 
channels using polar codes and later in Section |V] we will generalize this result to channels of arbitrary 
input alphabet sizes and arbitrary group operations. We note that 0(e) functions used in this paper do 
not depend on the size of the channel output alphabet. 

A. IjpT Rings 

Let G = XpT = {0,1,2,- ■■ ,p^ — 1} with addition and multiplication modulo p''' be the input alphabet 
of the channel, where p is a prime and r is an integer. For t = 0, 1, • • • , r, define the subgroups Ht of 
G as the set: 



Ht=p'G = {0,p\2p\... ,ip'-'-l)p'} 
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and for t = 0, 1, • • • ,r, define the subsets Kt of G as iCt = Ht\Ht+i; i.e. Kt is defined as the set of 
elements of G which are a multiple of p* but are not a multiple of p*+^. Note that Kq is the set of all 
invertible elements of G and Kr = {0}. One can sort the sets Kq > Ki >■■■> Kr in a decreasing 
order of "invertibility" of its elements. Let Tt be a transversal of Ht in G; i.e. a subset of G containing 
one and only one element from each coset of Ht in G. One valid choice for Tt is {0, 1, • • • ,p* — 1}. 
Note that given Ht and Tt, each element g of G can be represented uniquely as a sum g = g + g where 
g £ Tt and g£ Ht- 

B. Recursive Channel Transformation 

1) The Basic Channel Transforms: It has been shown in 1 1] that the error probability of polar codes 
over binary input channels is upper bounded by the sum of Bhattacharyya parameters of certain channels 
defined by a recursive channel transformation. The same set of synthesized channels appear for polar 
codes over channels with arbitrary input alphabet sizes. The channel transformations are given by: 

W-{yi,y2\ui)= V -W{yi\ui + u'^)W{y2\u'2) (5) 

^-^ q 

W^{yi,y2,ui\u2) = -W {yi\ui + U2)W {y2\u2) (6) 

for 2/1,2/2 G y and ui,U2 G G. Repeating these operations n times recursively, we obtain A^ = 2" 
channels W^ , • • • , Wj^ . For i = 1, • • • , A^, these channels are given by: 

where Gn is the generator matrix for polar codes. 

For the case of binary input channels, it has been shown in |[l| that as A^ — )• oo, these channels polarize in 

the sense that their Bhattacharyya parameters gets either close to zero (perfect channels) or close to one 

(useless channels). In the next part, we show that in general, when the input alphabet is a prime power, 

polarization happens in multiple levels so that as A^ — )• oo channels get useless, perfect or "partially 

perfect". 

For an integer n, let J„ be a uniform random variable over the set {1, 2, • • • , A^ = 2"} and define the 

random variable I'^{W) as 



r{W) = I{X;Y) (7) 

of Wy^" respectively and X is uniformly distributed. It has 
been shown in |2| that the process 7°, 1^,1^,- ■ ■ is a martingale; hence E{I"} = I^. For an integer n 



where X and Y are the input and output of W^^" respectively and X is uniformly distributed. It has 



13 
and for d € G, define the random variable Z^{W) = Zd{W^" ) where for a channel {G,y,W), 

Zd{W) = -Y.Y. ^W{y\x)W{y\x + d) = - J] ^(^^{.,.+4) (8) 

^ xeGyay ^ xGG 

This quantity has been defined in |l2|. Other than the processes I"'{W) and Z^{W), in the proof of 
polarization, we need another set of processes I'^{W) for if < G which we define in the following. 
Let H be an arbitrary subgroup of G. Note that any uniform random variable defined over G can be 
decomposed into two uniform and independent random variables X and X where X takes values from 
the transversal T oi H and X takes values from H. For an integer n, define the random variable Ijj{W) 
as 

mW) = I{X- Y\X) = I{X- Y\X) (9) 

where X and Y are the input and output of VF^" respectively. Next lemma shows that I^{W) is a 
super-martingale. 

Lemma IV.l. For an arbitrary group G and for any subgroup H of G, the random process I^[W) 
defined above is a super-martingale. 

Proof: Define the channels W^ and W^ as in ([Sjl and ([6JI. Define the random variables Ui, U2, 
Xi, X2, Yi and Y2 where Ui and U2 are uniformly distributed over G, Xi = Ui + U2 where addition 
is the group operation, X2 = U2 and Yi (respectively I2) is the channel output when the input is Xi 
(respectively X2). Decompose the random variable Ui into two uniform and independent random variables 
Ui and f/i where Ui takes values from the transversal T oi H and f/i takes values from H. Similarly 
define, ^72) -^1,-^2 and U2,Xi,X2. We need to show that 

I{Ui-YiY2\Ui) + I{iJ2;YiY2Ui\U2) < 2I{Xi-Yi\Xi) 

Note that since /" is a martingale and I{Xi; Yi\Xi) = I{Xi; Yi) — I{Xi; Yi), it suffices to show 

I{Ui;YiY2) + I{lJ2]YiY2Ui) > 2I{Xr,Yi) 
We have 

I{U2;YiY2Ui) = I{U2;YiY2UiUi) 

= I{U2; Y1Y2U1) + /(C/2; Ui\YiY2lJi) 
>I{U2;YiY2Ui) 
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Hence, 

I{Ui;YiY2) + /(f/2; Y1Y2U1) > I{Ur,YiY2) + liW, Y1Y2U1) 

= I{UiU2;YiY2) 

^^ I{XiX2;YiY2) = 2I{Xi-Yi) 

where (a) follows since Ui and U2 are recoverable from Xi and X2. To see this, let U[ and U'2 take 
values form G and let X[ = U'i + U2 and X2 = U2. We need to show that if X[ is in the same coset of 
H as Xi (i.e. ii X[— Xi ^ H or equivalently X[ = Xi) and X2 is in the same coset of H as X2 (i.e. if 
X2 — X2 £ H or equivalently X'2 = X2), then U{ is in the same coset of H as Ui (i.e. U{ — Ui ^ H or 
equivalently U[ = Ui) and U2 is in the same coset of H as U2 (i.e. U2 — U2 € H or equivalently U2 = U2)- 
Note that X^ - X2 G i^ implies U^-U2e H and Xj - Xi G F implies C/{ + C/^ - C/i - C/2 G i?. Since 
U'2-U2(^H (and hence U2 - U^ e H), it follows that U[ - Ui e H + U2 - U^ = H. This concludes 
the lemma. ■ 

2j Asymptotic Behavior of Synthesized Channels: We restate Lemma 2 of Q with a slight general- 
ization: 

Lemma IV.2. Suppose Bn, n G Z+ is a { — ,-\-}-valued process with P{Bn = —) = P{Bn = +) = g- 
Suppose In and Tn are two processes adapted to the process Bn satisfying the following conditions 

1) /„ takes values in the interval [0, 1]. 

2) /„ converges almost surely to a random variable /qo- 

3) Tn takes values in the interval [0, 1]. 

4) Tn+i = T^ when Bn+i = +• 

5) IfTn< efor all n, then In > I — 0(e) for all n, in the sense that there exists a function f which 
is 0(e) and Tn < e ^ In > ^ — /(e) for all n. 

6) IfTn>l — efor all n, then In < 0(e) for all n, in the sense that there exists a function g which 
is 0(e) and Tn > 1 — e ^ In < g(e) for all n. 

Then I^o = linin-s>oo In and T^o = lini„_j>oo Tn both exist with probability 1 and take values in {0, 1}. 

Proof: The proof follows from Lemma 2 of [2]. A sufficient condition for I„ to converge is when 
/„ is a bounded super-martingale. Note that condition (i&t.l) of Lemma 2 of |2] can be recovered from 
the last two conditions of this lemma. We use this notation to be consistent throughout the paper. To see 
this, note that (5) and (6) imply that there exist functions f(-),g(-) : R — )• R such that lim^^o f(^) = 
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and liuisio g{6) = and that Tn < 6 implies In > 1 — f{S) and r„ > 1 — 5 implies /„ < g{6). For an 
arbitrary e > 0, since the limit of both functions at zero is zero, let 5 > be such that f{6) < e and 
g{6) < e. For this choice of 6 we have 

Tn<5^In>l- f{5) > 1 - e 

Tn>l-6^In< giS) < € 

Hence for any (sufficiently small) e > 0, there exists a. 6 > such that Tn < 6 implies /„ > 1 — e and 
Tn > I — 6 implies In < £■ Equivalently, for any e > 0, there exists a. 6 > such that e < /„ < 1 — e 
impUes S <Tn<l-6. ■ 

In the next lemma, we show that for any d G G, the random process Z2 converges to a Bernoulli 
random variable. 

Lemma IV.3. For all d G G, Z^iW) converges to a {Q,\}-valued random variable Zf{W) as n 
grows. Moreover, if d £ G is such that {d) = (d) then Z'^{W) = Z'^{W) almost surely; i.e. the 
random processes Z'^{W) and Z^iW) converge to the same random variable. 

Proof: This lemma has been proved in |[2| Theorem 1] for d = arg maxa^o Za{W) when the 
underlying group is a field. The proof for an arbitrary d and an arbitrary group is given in the following. 
Let H = {d) be the subgroup of G generated by d and let M be a maximal subgroup of H. Then the 
proof provided in |J2| suffices for this lemma if we consider the quotient group H/M which is of prime 
order. We will elaborate on this in the following: Let 

d' = argmaxZaiW) (10) 



In Lemma IV.2[ let /" (Here we use the notation /" instead of I„ for notational convenience) be equal 



to the process I^iW) - IljiW) where Ih{W) and I^iW) are defined by Equation Q and let T„ be 
equal to the process Z^, (W) defined in (jsl). We claim that /" and T^ satisfy the conditions of Lemma 



IV.2 The proof is given in the following: 



Note that in the case of Zp fields, the only maximal subgroup of the group is the trivial subgroup 
{0}. Hence, (10 1 can be viewed as a straightforward generalization of the the definition made in 111. 



Let M be a maximal subgroup oi H = (d). Recall that a uniform random variable X over G can 
be decomposed into two uniform and independent random variables X taking values from H and X 
taking values from the transversal of H in G. Similarly, the uniform random variable X over H can be 
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decomposed into two uniform and independent random variables X taking values from M < H and X 
taking values from the transversal of M in H. Using the chain rule we have: 

I{X;Y\X)=I{XX;Y\X) 

= IiX;Y\X)+I{X;Y\XX) 

Note that X G M and {X, X) indicate the coset of M in G to which X belongs. Therefore, the equation 
above implies that for each n, I^{W) — I2,j{W) = I{X;Y\X) where X and Y are the input and the 
output of the channel Wf^' . Since X can at most take |^ values, by choosing the base of the log 



function to be equal to Ijjr condition (1) of Lemma 



IV.2 



satisfies. 



We have shown in Lemma 



IV. 1 



that both processes I'^(W) and I'^j(W) are super-martingales and 
hence both converge almost surely. This means that the vector valued random process {I^{W), ImO^)) 
converges almost surely (refer to Proposition 5.25 of [8J). Hence condition (2) is satisfied. 

Condition (3) trivially holds and condition (4) is shown to be satisfied in the proof of Theorem 1 
of 12[. 

To show (5), assume ZJ, (VF) < e. Let Th be a transversal oi H in G and let Tm be a transversal 
of M in H. Given X G tn + H ioi some tn G Th, the joint probabihty distribution of cosets of M in 
tn + H and the channel output is given by: 

p{tH + tM+M,y)^ Y^ P{X = tH + tM+m,Y = y\X etH + H) 

meM 

^ y- P{X = tH + tM + m,Y = y) 

hi PiXetH + H) 

P{X = tH + tM + m,Y = y) 



^ \H\/\G\ 

m\ ^ T^^W{y\tH + tM+m) 

— Y^ W{y\tH + tM + m) 
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where tu takes values from Tm- The con^esponding channel is defined as: 

Wiy\t, + tM + M) = ^,^^,^/^j^\^^,^^H) W\^ ^^^'^" + ^" + "^^ 

niGM 

= TM\^ ^(y|iH + ^M + "i) (11) 

Note that the input of this channel takes values from the set {tn + tM + M\tM G Tm} uniformly and 
the size of the input alphabet is |J = q which is a prime (since M is maximal in H). Furthermore, by 
definition I(W) = I{X;Y\X = tu)- It is shown in Appendix C that Zd-{W) < e implies Z{W) < Ce 



Prop. 3] implies I{W) = log 1^ — 0(e). This result is 



for some constant C = ' |jj-|_imi • Therefore, |2 
valid for all tn G Th- Therefore 

Ih{W)-Im{W)= Y, P{X = tH)I{X-Y\X = tH) 

t„&TH 

To show condition (6), assume that Z^,{W) > 1 — e. For the channel W defined as above, it is shown 
in Appendix |d] (An alternate proof for the Zpr case can be found in Appendix IeI) that Z^/ (W) > 1 — e 
implies Za'+tH+M{W) > 1 — '^-i^,^ ' = 1 — 0(e). Since the input alphabet of the channel W has aprime 
size and d' G H\M, we can use |2 Lemma 4] to conclude that Z{W) > 1 - ^'^^"1^1"^''' = 1 " 0(e). 
Now we use p] Prop. 3] to conclude I{W) < 0(e). This implies: 

Ih{W)-Im{W)= Y, P{X = tH)I{X;Y\X = tH) 

<0{e) 

So far, we have shown that for any d ^ G, for H = (d) and d' defined as in ( [TO] ), the random variable 
Z^, {W) converges to a Bernoulli random variable. Note that so far the proof is general and applies to 
arbitrary groups as well. We will use this part of the proof later in Section |V] Next, we show that when 
G = Zpr, for any d £ H\M (including d itself), Z^{W) converges to a Bernoulli random variable. 
Moreover, all such d's converge to the same random variable. To see this, note that if Z^, < e, it follows 
that Z'i < e for all d £ H\M and if ZJ, > 1 - e we show that for all d £ {d') = H, Z^ > I- 0(e). 
For any d £ H = (d) we can write d = id' for some integer i. The condition Z^' > 1 — e implies 
1 — -Z^(W^{2:.a;+d'}) < ^e for all x G G. It has been shown in the proof of ||2| Lemma 4] that 



1 - Z{W{,^,+2d'}) < ^1 - Z{W{,^,+a'}) + ^1 - Z{W{,^a,cc+2d'}) < 2\/ge 
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Repeated application of this inequality for i times yields \/^ — ^{Wr^^,^-,) < i^fqe < q^/qe or 



{x,x+d}' 

equivalently Z{Wr^_^j^.) > 1 - q^e. It then follows that Zj > 1 - q^e. Note that when G = Zpr, 
H\M is the set of all elements d such that (d) = (d). This completes the proof of the lemma. ■ 

The next lemma gives a sufficient condition for two processes Z^ and Z^ to converge to the same 
random variable. Recall that ioi < t < r — 1, Kt = Ht\Ht+i. 

Lemma IV.4. If d,d G Kt for some < t < r — 1, then Z^ and Z^ converge to the same Bernoulli 
random variable. 



Proof: Note that d,d e Kt implies (d) = (d) = Ht. Therefore, Lemma IV.3 impUes Z^ and Zj 
converge to the same Bernoulli random variable. 

■ 
For t = 0,l,---,r — 1, pick an arbitrary element kt G Kt. The lemma above suggests that we only 
need to study Z^/s rather than all Z^'s. 

Lemma IV.5. If Z^^ > 1 — e then Zk^ ~e 1 for all t < s < r — 1. 

Proof: Note that kg G (kt) and let d = kt and kg = id for some integer i. The condition Z^^ > 1 — e 
implies 1 — Z{W^x,x+d}) ^ Q^ for ^H x G G. It has been shown in the proof of |2[ Lemma 4] that for 
all X G G 



1 - Z{W{,^^+2d}) < 2^9^ 



Repeated application of this inequality for i times yields ^/l — Z(W^r^. ^._,_^ i ) < i,,/qe for all x G G. It 
follows that Zk^ > 1 - 0(e). ■ 

This lemma implies that for the group G = l^pr all possible asymptotic cases are: 

. Case 0: Zk„ = 1, Z^, = 1, Z^^ = 1, • • • , Zk^_^ = 1 

. Case 1: Zk„ = 0, Zk, = 1, Z^^ = 1, • • • , ^fc,-i = 1 

. Case 2: Zk„ = 0, Z^^ = 0, Z^^ = 1, • • • , ^fc^.i = 1 

. Case r: Zk, = 0, Zk, = 0, Z^, = 0, • • • , Zk^_^ = 0, 
where for t = 0, • • • , r, case t happens with some probability pt. 
Next, we study the behavior of /" in each of these asymptotic cases. 

Lemma IV.6. For a channel (Zpr,3^, VF) and for t = 0, 1,--- , r, if Z^^^ < e,Zi^^ < e, • • • ,Zj^^_^ < 
e,Zfc^ > 1-e, ••• ,Zfc_, > 1-e, then tlogp - 0{e) < I°{W) <tlogp + 0{e). 
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Proof: Note that for all s = 0, • • • , r — 1, Mg = {kg+i) is a maximal subgroup of {ks). In the proof of 



Lemma[rVGJ if we let d = ko and Mq = {ki), we get /g(^)-^a/o(W^) = I{W)-ImAW) -e logp (Here 
we take the base of the log function to be equal to 2). Similarly, it follows that Im,{W) ~ Im^+i{W) ssg 
logp for all < s < t — 1. For s > t we have, Im, — Im^+i ~e 0. Therefore, 

r-l 

s=0 

t-l r-l 



J2 Im. {W) - Im.^, {W) + Y^ Im. {W) - /m.+, {W) 

1=0 

t-l r-l 

J]logp + ^0 



s=0 s=t 

t-l r-l 



s=0 s=t 

= tlogp 

m 
We have shown that the process /" converges to the following r + 1 valued discrete random variable: 
/°° = t logp with probability p^ for t = 0, • • • , r. 

For t = 0, ••• ,r, define the random variable Z^{Wj^) = YldiHt ^d-{Wi!j) and the random process 
(Z*)(")(VF) = Z\W^j^"'^) where J„ is a uniform random variable over {1,2, • • • ,iV = 2"}. Note that 
(Z*)(")(W') converges to a random variable (Z*)(°°)(W') almost surely and P ((Z*)(°°) = O) = YJs=tPs- 
3) Summary of Channel Transformation: For the channel {1jpr,y,W), consider the vector random 
process V" = (Z^ , Z^ , • • • , Z^ ,!"')■ We have seen in the previous section that each component 
of this vector random process converges almost surely. Proposition 5.25 of |8| implies that the vector 
random process V" also converges almost surely to a random vector V°°. The random vector V°° is a 
discrete random variable defined as follows: 

(t times r—t times \ 

V°° = (0~^,1~^, tlogp) =pt 

for t = 0,1,--- ,t where p^'s are some probabilities. This implies that for all e > 0, there exists a 
number N = iV(e) = 2"(') and a partition {A^q, ylf , • • • , A^} of {1, • • • , iV} such that for t = 0, • • • , r 
and i £ A^ Zk^{WJ}^) < 0(e) if < s < t and Zk,iwji^^) > 1 - 0(e) if t < s < r. For t = 0, • • • ,r 
and i G Af, we have I{WJ^'^) = tlog{p) + 0(e) and Z*(VF'i'^) = 0(e). Moreover, as e ^ 0, i^ ^ Pi 
for some probabilities pQ, • • • ,pr. 



20 

In Appendix [f1 we show that for any P < \ and for t = 0, • • • , r, 

lim P ('(Z*)(") < 2-2^") > P ('(Z*)('^) = o) (12) 

r 
s=t 

Remark IV.l. This observation implies the following stronger result: For all e > 0, there exists a 
number N = N{e) = 2"''^' and a partition {Aq, A\, • • • , A^} of {1, • • • , N^ such that for t = 0, • • • , r 
flncf i E A^, /(T^i^^) = tlog(p) + 0(e) a«J ^*(VFj5^) < 2-2''"<". Moreove?; a-y e ^ 0, ^ ^ l^t /or 
some probabilities po,- ■ ■ ,Pr- 

C. Encoding and Decoding 

In the original construction of polar codes, we fix the input symbols corresponding to useless channels 
and send information symbols over perfect channels. Here, since the channels do not polarize into two 
levels, the encoding is slightly different and we send "some" information bits over "partially perfect" 
channels. At the encoder, if i G ^^ for some t = 0, • • • , r, the information symbol is chosen from the 

(i) 

transversal Tt arbitrarily and not from the whole set G. As we will see later, the channel W^ is perfect 
for symbols chosen from Tt and perfect decoding is possible at the decoder. Let X^ = 0[=o ^t ' ^^ 
the set of all valid input sequences. For the sake of analysis, as in the binary case, the message u^ is 
dithered with a uniformly distributed random vector bi € ©[=o ^t * revealed to both the encoder and 
the decoder. A message Vi E X^ is encoded to the vector x^ = {v^ + bi)GN- Note that u^ = v^ + bi 
is uniformly distributed over G^. 

At the decoder, after observing the output vector y^, for t = 0, • • • , r and i ^ Af , use the following 
decoding rule: 

Ui = fiiyi,u{~^) = argmaxWJf {y^ ,u\-^\g) 
geb,+Tt 

And finally, the message is decoded as Vi = u^ — b^ . 

The total number of valid input sequences is equal to 



2NR = Y[\Tt\\^^\ =Y[pt\M ^Y[p 
t=o t=o t=o 



tptN 



Therefore, the rate is equal to i? = J2l=oPi^^^^P- ^^ ^^^ other hand, since /" is a martingale, we have 
E{/°°} = I^. Since E{/°°} = "^l^QPttlogp, we observe that the rate R is equal to the symmetric 
capacity I^. We will see in the next section that this rate is achievable. 
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D. Error Analysis 

Let Bi be the event that the first error occurs when the decoder decodes the ith symbol: 

c{(nf,OGG^x3^^K //.«,<-!)} 
For i = 0, • • • , r and i G ^^, define 

^. ={(nf ,yf) G G^ X 3;^|T^«(yf ,urV.) 

< W^jv (yf , u\''^\ui) for some Ui£bi + Tt ,Ui^ Ui} 

Lemma IV.7. for t = 0, • • • , r anJ i G ^f, P(^i) < q'^Z'^{WJi^^). 

Proof: For Uj G G, write Ui = hi{ui) + Vi{ui) where hi{ui) G Ht and fi(ui) G Tt. We have 



^(^^)= E ^WM{y^\u^)tEX<, 



""y^] 



(13) 



(14) 



^ E ;wv^^(^ 



-f|< 



E 



<^(yf,^rV.) 



E H E i»v(.n"f 



«l,J,f 



E 

Uiebi{ui)+Tt,Ui^Ui 



^^\wj^\y^,u\-'\u.) 



(^)t„,N „,i-l 



E >i;^(yf>< 



Ui 



E 



«l,s/i 



«,eb.(«,)+Tt, 



Wg(yf,nr^|n.) 



E E J E V^l;^(2/f,nr^in.)i^««,nrv.) 



.«^ 



= E E J^i«..a«) 

For tij G G and lij G bi{ui) + Tt, if Ui y^ Ui, then nj,iti are not in the same coset of Ht and hence 



Uj — u. 



rii)^ 



^ Ht. Therefore, m-Ui £ G\Ht. Note that for d = Ui- Ui, Z{u,^u,y{W}^') < qZd{Wj^'). Since 



d G G\Ht, we have Zd(WJ^^) < Z\wj^^) and hence, 



7tnjAi)^ 



Therefore, P{Ei) < q\Tt\Z\wl^^) < q^Z^w}^^). 
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The probability of block error is given by P{err) = Yl\=o SieA= P{Bi). Since Bi C Ei, we get 

r 

P{err) <^Y1 ^''^'i^N) (15) 

< J]|^^|<722-2'' (16) 

t=0 

< q'^N2-^'^ (17) 



for any [i < \ where (a) follows from Remark IV. 1 Therefore, the probability of error goes to zero as 
e — >• (and hence n — )• oo). 

V. Polar Codes Over Arbitrary Channels 

For any channel input alphabet there always exist an Abelian group of the same size. In this section, 
we generalize the result of the previous section to channels of arbitrary input alphabet sizes and arbitrary 
group operations. 

A. Abelian Groups 

Let the Abelian group G be the input alphabet of the channel. It is a standard fact that any Abelian 
group can be decomposed into a direct sum of 'Epr rings |9]. Let G = 0;=^ R/ with R/ = l^^i where 
p/'s are prime numbers and r^'s are positive integers. For t = (ti, ^2, • • • , ti) with t^ G {0, 1, • • • , r/}, 
there exists a corresponding subgroup H of G defined hy H = 0i=iP*'Ri- For a subgroup i7 of G 
define Th to be a transversal of H in G. 

B. Recursive Channel Transformation 

1) The Basic Channel Transforms: The transformed channels W~^ and W~ and the process I"'{W) 
are defined the same way as the Zp^ case through Equations ([5]), Q and ([7]). 

2) Asymptotic Behavior of Synthesized Channels: For d G G, define Z^CW) same as (Isll where 
g = |G| and for H < G, define I^{W) by Equation ([9]). To prove the polarization for arbitrary groups, 
we need the following lemma: 

Lemma V.l. For di,d2 G G, if Zd,{W) > 1 - e and Zd^{W) > 1 - e, then Z^{W) w, 1 for any 
d G {di,d2) where (^1,(^2) i^ the subgroup of G generated by di and ^2- 
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Proof: The condition Zd^ > 1 — e implies 1 — Z{W^x,x+di}) — ^^ ^^'^ ^^^ condition Z^^ > 1 — e 
implies 1 — Z{W^x^^^d^y) < qe. Similar to the proof of Lemma 



IV. 5 



we have 



It is also straightforward to show that 



1 - Z(VF{.,..+d,+d4) ^ 2V9i 



Since d G (di, ^2), it can be written as d = idi + jd2 for some integers i,j. Repeated application of the 
above inequalities yields the lemma. ■ 

Remark V.l. This lemma is generalizable to the case where for di,--- ,dm £ G, Zci^{W) > 1 — 

e, Zd2 (W) > 1 — e, • • • , Zd^ (W) > 1 — e. In this case, we have Z^{W) w^ 1 for any d £ {di,d2,- " j dm)- 



The following lemma is a restatement of Lemma IV.3 Here, we prove it for arbitrary groups. 



Lemma V.2. For all d G G, Z^(W) converges to a {0, l}-valued random variable Z^(W) as n grows. 
Moreover, if d £ G is such that (d) = (d) then Z^{W) = Z°^{W) almost surely; i.e. the random 
processes Z'^iW) and Z^iW) converge to the same random variable. 



Proof: Similar to the proof of Lemma IV.3 let H = (d) and let M be any maximal subgroup of 
H. Define 

d' = argma.xZaiW) (18) 

It is relatively straightforward to show that in the general case as well, Z2,{W) converges to a {0, 1}- 



valued random variable Z'^{W). Indeed this part of the proof of Lemma IV.3 is general enough for 
arbitrary Abelian groups. Here we show that this implies Z'2{W) also converges to a Bernoulli random 
variable. 

Let \H\ = ni=i qT where qj's are distinct primes and aj's are positive integers. Note that H is isomorphic 
to the cyclic group 1^\h\- For i = 1, • • • ,k, define the subgroup Mj = {qi) of 1^\h\ (and isomorphically 
of H) and let d[ = argmaxag// Za{W). Note that for i = 1, • • • ,k, Mi is a maximal subgroup of 1^\h\ 
(and isomorphically of H). Therefore, for i = 1, • • • , A;, Z^^iW) converges to a {0, l}-valued random 
variable. If for some i = I,- ■ ■ ,k, Zd'.{W) < e it follows that Zd{W) < e (since d G H\Mi) and 



if for all i = 1, • • • , fc, Zdr{W) > 1 - e, it follows from Remark V.l that Z^{W) > 1 - 0(e) for 
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any d G {d'^, d'2, • • • , d'^,)- Next, we show that (d'^, d'2, • • • , d'^) = H and this will prove that if for all 
i = 1, • • • ,k, Zd'^iW) > 1 - e then Z^iW) > 1 - 0(e). For i = 1, • • • ,k, since d'^ ^ Mj it follows that 
d'^^0 (mod qi). Define 



k 



d' 



Then we have 6^0 (mod qi) for alH = 1, • • • , A;. This implies (S) = H and hence (d'^, ^2, • • • , d'j,) = 
H. Therefore, if in the limit Zd[ {W) = for some i = 1, • • • , /c then Zd{W) = and if Zd-^ {W) = for 
all i = 1, • • • , fc then Zd{W) = 1. This proves that Z^iW) converges to a Bernoulli random variable. 

If d G G is such that {d) = (d) then it follows that d G H and d ^ Mj for i = 1, • • • ,k. Therefore if in 

the Umit Zd'^ (W) = for some i = 1, • • • , /c then Z^{W) = and if Z^^ (W) = for alH = 1, • • • , A: 

then Zg{W) = 1. This proves that the random processes Z'^AW) and Z^iW) converge to the same 

random variable. ■ 

In the asymptotic regime, let di, 1^2, • • • ,dm be all elements of G such that Zd^ {W) = 1 and assume that 

for all other elements d G G, Zd{W) = (we can make this assumption since in the limit Z^'s are {0, 1}- 

valued). We have seen that if Zd^W) = 1 for i = 1, • • • ,m then for any d G (di, d2, • • • , dm), Z^{W) = 

1. Therefore, (di, d2, • • • , dm) ^ {di,d2,--- , dm} and hence {di, d2, • • • , dm} = (di, d2, • • • , dm) = H 

for some subgroup H of G. This means all possible asymptotic cases can be indexed by subgroups of 

G. i.e. for any H < G, one possible asymptotic case is 

f 1 if d G if ; 
. Case H: Zd{W) = I 

I Otherwise, 
where for H < G, case H happens with some probability pn. 

Next, We study the behavior of /" in each of these cases. 



Lemma V.3. For a channel (G, 3^, W) and for a subgroup S of G, if Zd > 1 — efor d G S* and Zd < e 

\s\- 



for d^S, then I^{W) ^, log ^ 



Proof: Let = Mo C Ml C • • • C Mt-i <^ S = Mt (^ Mt+i C • • • G = M^ for some positive 
integer k be any chain of subgroups such that Mg-i is maximal in Mg for s = 1, • • • ,k. 

For s = 1, ■■■ ,t let H = Ms and M = Mg-i and let Th be a transversal of ii in G and let Tm 
be a transversal of M in H. For d ^ H, we have Zd{W) > 1 — e. For tn G Th define the channel 
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W{y\tH + tu + Afs__i) similar to ( [TT] ). We have shown in Appendix [d] that if for some d G H\M, 
Zd{W) > 1 — e then Zd+tn+MiW) > 1 — 0(e). Since the input alphabet of the channel W has a prime 
size, we can use pi Lemma 4] to conclude that Z{W) > 1 — 0(e). Now we use pj Prop. 3] to conclude 
I{W) < 0(e). This result is valid for all tn G Th- Since I{W) = I{X;Y\X = tn), we conclude that 

Ih{W)-Im{W)= Y, P{X = tH)I{X-Y\X = tH) 

tneTH 

<0(e) 
Therefore, for s = 1, • • • , t, hiAW) - hi^^AW) ~e and hence, Im^W) = Is{W) w, Im,{W) = 0. 

For s = t + 1, ■ ■ ■ , k let H = Mg and M = Mg^i and let Th be a transversal of i7 in G and let 
Tm be a transversal of M in H. For d G H\M, we have Zii{W) < e. For the channel W defined 
as above, we have shown in Appendix |c] that if for all d G H\M, Zd{W) < e then Z{W) < 0(e). 
Therefore, |2 Prop. 3] implies I{W) = log [J — 0(e). Similar as above, we conclude that 

Ih{W) - Im{W) = log M _ o(e) 

Therefore, for s = t + 1, • • • ,k, Im, {W) — Im,_i (W) ss^ log ,j^ "' , and hence 

^ \M \ 

Ig{W) - Is{W) ^, Y^ lo, 



,Ms-i\ 

s=t+l ' '^ -^' 

1 |G| 

Since Is{W) «, 0, We conclude that P{W) = laiW) w^ log g. ■ 

IGI 
We have shown that the process /" converges to the following discrete random variable: I°° = log hj\ 

with probability pn for H < G. 

For H <G, define the random variable Z^{wfp) = Y^d^H ^rf(^jv^) and the random process (Z^)(")(Ty) 

^^(^i^"^) where J„ is a uniform random variable over {1,2,--- ,iV = 2"}. Note that (Z^)(")(VF) 

converges almost surely to a random variable {Z^)^^\W) and P ((Z^)(°°) = O) = Y1iS<hPs- 



3) Summary of Channel Transformation: For the channel (G, y, W), the convergence of the processes 
7" and (Z^)" for 77 < G implies that for all e > 0, there exists a number N = N{e) and a partition 
{A'jjlH < G} of {!,■■■ ,N} such that for 77 < G and i G A\j, I{W^^'') = log Q + 0(e) and 



Z^{wl^'^) = 0(e). Moreover, as e -^ 0, ^fi _^ ^^ for some probabilities pH,H <G 
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In Appendix [f1 we show that for any /3 < ^ and for f/^ < G, 

Imi P ('(Z^)(") < 2-2^") > P ('(Z-^)(°^) = o) (19) 

r 
S<H 

This implies that for all e > 0, there exists a number N = N{e) = 2""^^' and a partition {A'^fj\H < G} 
of {!,■■■ ,N} such that ior H < G and i G A'^^, I{W^^^) = log g + o{e) and Z^{w}^^) < 2-2""*". 
Moreover, as e — >• 0, '-^ — )■ p// for some probabilities pn, H < G. 

C. Encoding and Decoding 

At the encoder, if i G A'^j for some H < G, the information symbol is chosen from the transversal 
Th arbitrarily. Let X^ = 0j:/<q T^j" be the set of all valid input sequences. As in the Zpr case, the 
message u^ is dithered with a uniformly distributed random vector b^ G ©h<g H^'" revealed to both 
the encoder and the decoder. A message v^ G X^ is encoded to the vector x^ = {v^ + b^)GN. Note 
that Ui = Vi +bi is uniformly distributed over G^. 

At the decoder, after observing the output vector yj^, ior H < G and i G A^ , use the following 
decoding rule: 

Ui = Myi,u{~') = argmaxVF«(yf ,ni-i|5) 

And finally, the message is recovered as v^ = Ui — b^. 
The total number of valid input sequences is equal to 



|G| 
\H\ 



\Ah\ 



2^«= n irH|i^-i= n 

H<G H<G 

Therefore the rate is equal to i? = X]h<g ~n ^°S Th\- ^^ ^^^ other hand, since /" is a martingale, 
we have E{/°°} = l'^. Since E{/°°} = YIhkgPh log [^, we observe that the rate R converges to the 
symmetric capacity /'^ as e — )• 0. We will see in the next section that this rate is achievable. 

D. Error Analysis 

For H < G and i G A'jj, define the events Bi and Ei according to Equations ( [T3] ) and ^T4\ . Similar 
to the Zpr case, it is straightforward to show that ior H < G and i G A|^, P{Ei) < q'^Z^CWJ:^ ) where 
q = \G\. The probability of block error is given by P{err) = X]h<g SieA' P{Bi). Since Bi C Ei, 
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we get 



P{err) < Y. T. 'i'^'^iW^'^) 
H<Gi&A' 



< E \^h\<i'2 



2o-2'5'' 



H<G 



< q^N2-^"' 



for any (3 < ^. Therefore, the probabiUty of block error goes to zero as e — ;■ (n — ;■ oo). 

VI. Relation to Group Codes 
Recall that for an arbitrary group G, the polar encoder of length N introduced in this paper maps the 
set ®h<g^h" to G^ where for a subgroup H of G, Th is a transversal of H and {Ah\H < G} 
is some partition of {!,••• ,N}. Note that the set of messages ®h<g'^h" ^^ ^'^^ necessarily closed 
under addition and hence in general, the set of encoder outputs is not a subgroup of G^; i.e. polar codes 
constructed and analyzed in Sections |IV] and |V] are not group encoders. On the contrary, the standard polar 
codes (i.e. polar codes in which only perfect channels are used) are indeed group codes since their set 
of messages is of the form G'^©{0}^^'"'^J^\'^ for some A C {1, • • • , N} which is closed under addition. 



It is worth mentioning that polar encoders constructed in this paper fall into a larger class of structured 
codes called nested group codes. Nested group codes consist of two group codes: the inner code Cj 
and the outer code Co such that the inner code is a subgroup of the outer code (Cj < Co). The set of 
messages consists of cosets of Cj in Co- For the case of polar codes, the inner code is given by 



C, 



©i/^« 



H<G 



mG 




and the outer code is the whole group space: Co = G . To verify that this is indeed the case, it suffices 



to show that the set of codewords of polar codes 



erpAn 
H<G ^H 



G has only one common element with 



each coset of Cj. Equivalently, it suffices to show that for mi,m2 G G^, if miG — m2G G Cj, then 
either mi ^ ®h<gTh" or ms ^ ©H<G^i"- 

Lemma VI.l. For N = 2"^ where n is a positive integer, the generator matrix corresponding to polar 
codes Gn = B^F®^ is full rank. 
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Proof: Since Gn = B^F®^ where B^ is a permutation of rows, it suffices to show that F®" is 

full rank. Note that the rank of the Kronecker product of two matrices is equal to the product of the 

ranks of matrices and the rank of F is equal to 2. Hence we have rank(G) = rank(F'^") = 2^ = N. ■ 

This lemma implies that if miG — m2G G Cj then ttt-i — m,2 G ®h<g^^"- "^^^^ means either 

mi ^ ®H<G '^h" ^^ "^2 ^ ®H<G '^h" ■ ^^^^ proves that polar codes are indeed nested group codes. 

In this section, we consider two examples of channels over Z4. The first example is Channel 1 
introduced in Section |lll] Based on the symmetry of this channel, we show that polar codes achieve 
the group capacity of this specific channel. The intent of the second example is to show that in general, 
polar codes do not achieve the group capacity of channels. In order to find the capacity of polar codes 
as group codes, we use the standard construction of polar codes, i.e. we only use perfect channels and 
fix partially perfect and useless channels. 

A. Example 1 

Consider Channel 1 of Figure [T] Define Hq = {0,1,2,3}, Hi = {0,2} and H2 = {0} and define 
Ko = {1,3}, Ki = {2} and K2 = {0}. For this channel we have: 

I^ = I{X;Y) = 2-e-2X 

ll^I{Xi;Y) = l-{e + X) 

{l',f^I{X[;Y) = l-{e + X)=I^2 
where X is uniform over Z4, Xi is uniform over Hi and X[ is uniform over I -t Hi. The capacity of 



group codes over this symmetric channel is equal to |10|: 

C = min(/^, I^ + (12)°) = min(2 - e - 2A, 2 - 2e - 2A) 

= 2 - 2e - 2A 

All possible cases for this channel are 
. Case 0: Z^ = Z^ = l,Zf = 1 
. Case 1: Z^ = Z^ = 0,Z^ = 1 
. Case 2: Z^ = Zf = ^,Zf = ^ 

As we saw in Figures [2] and [3] this result agrees with the asymptotic behavior of I" predicted by the 
recursion formulas ([T]) and Q. 
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Define I{W''^^^'"''") = IiX;Y) where X, Y are the input and output of W''^''^'"''" and X is uniform 
over Z4. Similarly, define l2{W''^''^'"'''') = I{Xi;Y) where Xi, Y are the input and output of W''^''^'"''" 
and Xi is uniform over Hi and define ^(M^^i''^'''") = I{X[; Y) where Xj, Y are the input and output of 
^bi62-bn and X{ is uniform over l + Hi. Define the mutual information processes 12, I2 and {I2)"' to be 
equal to I{W^^''^-^"), l2{W^^^''-^") and /^(l^^i^^'''") where for i = 1, • • • , n, bi's are iid Bernoulli(0.5) 
random variables. For this channel, we can show that l2{W^^''^"'^") = I2{W^^^^"'^") = 1 - (e„ + An) 
and conclude that {I2 + 12)"' — I2 ~^ (^2)" i^ ^ martingale. Therefore 12 and (I2 + 12)" converge almost 
surely to random variables I|° and {I2 + 12)°° respectively. This observation provides us with an ad-hoc 
way to find the probabilities p(, t = 0, 1, 2 of the limit random variable I^ for this simple channel. We 
can show the following for the final states: 

. case 0^/^ = 0, {h + /a)"^ = 
. case 1 ^ /°° = 1, {I2 + /^)~ = 
. case 2^ I^ = 2, {h + 12)°° = 2 
Therefore we obtain the following three equations: 

E{/f } =po-0 + pi-l+p2-2 = I^ = 2-e-2X 

E{(/2 + 12)°°} =Po-0+pi-0+p2-2 = {l2 + l2f = 2-2e-2\ 

PO + Pi + P2 = 1 

Solving this system of equations, we obtain: 

P2 = 1 - e - A = C/2 

Pl=l!-il2 + l'2f 

p, = l-[ll-{l2 + l'2f/2) 

We see that the fraction of perfect channels is equal to the capacity of the channel achievable using 
group codes and therefore, polar codes achieve the capacity of group codes for this channel. 

B. Example 2 

The channel is depicted in Figure |7] We call This Channel 3. For this channel, when A = 0.2 we have: 

I^ = I{X-Y) = 0.6390 
(1^ + 12)° = 0.2161 
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Fig. 7: Channel 3 



The rate C = min(/4, {I2 + ^2)'') = (-^2 + -^2)° — 0.2161 is achievable using group codes over this 



channel |10|. 



For this channel we have three possible asymptotic case: 

. Case 0: Z^ = i, Z^ = 1 ^ I|^ = 0, {h + /g)"" = 

. Case 1: Zf = {),Zf = 1 ^ If = I, {h + I2T = 

. Case 2: Zf = 0, Z^ = ^ /|^ = 2, {h + /g)"^ = 2 
Therefore we obtain the following three equations: 

IE{/n=Po-0 + pi-l+P2-2 
^{{h + /2)°°} = Po • + pi • + p2 • 2 

PO + Pi + P2 = 1 

Therefore, the achievable rate using polar codes over this channel is equal to i? = 2p2 = ]E{(/2 + 12)°°}- 
We have E{(/2 + -^2)^} — 0.2063 which is strictly less than {I2 + 12)^ ■ The following lemma implies 
R = E{(/2 + 1^)°°} < E{(/2 + I2Y} <C = {I2 + 12)° and completes the proof. 

Lemma VI.2. For a channel (Z4, y, W), the process {I2 + I^", n = 0, 1, 2, • • • is a super-martingale. 



Proof: Follow from Lemma IV. 1 with H = {0, 2}. 



VIL Conclusion 

It has been shown that the original construction of polar codes suffices to achieve the symmetric 
capacity of discrete memoryless channels with arbitrary input alphabet sizes. It is shown that in general, 
channel polarization happens in several levels so that some synthesized channels are partially perfect and 
there needs to be a modification of the coding scheme to exploit these channels. It has also been shown 
that polar codes do not generally achieve the capacity of arbitrary channels achievable using group codes. 
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Appendix 

A. Polar Codes Over Abelian Groups 

Given a fc x n matrix Gn of O's and I's, one can construct a group code as follows: Given any message 
tuple u^ G G^, encode it to u^ ■ Gn- Where the elements of G„ determine whether an element of u^ 
appears as a summand in the encoded word or not. For example consider the generator matrix 

/ 1 ^ 
10 10 
110 
1111 



Ga 



\ 



Then u^ ■ G4 is defined as 



[U1U2U3U4] 



1 ^ 
10 10 
110 
1 1 1 1 y 



/ 



' U1+U2 + U3 + U4 * 

U2 + U4 



Using this convention, we can define a group code based on a given binary matrix without actually 
defining a multiplication operation for the group. 

B. Recursion Formula for Channel 1 

1) Recursion for W^: We show that W^ (corresponding to 61 = 1) is equivalent to a channel of the 
same type as W but with different parameters ei and Ai corresponding to e and A respectively; where, 

ei = e^ + 2eA 
Ai = A2 

We say an output tuple (yi,y2,'Wi) is connected to an input ^2 G Z4 if VF+(yi, 7/2, ^i|'U2) = \W{yi\ui + 
U2)W{y2\u2) is strictly positive. 



First, let us assume the output tuple (yi, y2, ui) is connected to all U2 G ^4- Then W{y2\u2) must be 
nonzero for all U2 and hence 2/2 = -E3. Similarly since W{yi\ui + U2) is nonzero for all U2 (and hence 
all ui +U2) it follows that yi = E^. Therefore VF+(i?3, £'3,^1! 1*2) = \\^ for all ui,U2 G Z4 and these 
are all output tuples connected to all inputs (with positive probability). Since all of these output tuples 
are equivalent we can combine them to get a single output symbol connected to all four inputs with 
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probability A^. 

Next we show that if an output tuple is connected to an input from {0, 2} and an input from {1, 3}, 
then it is connected to all inputs. Consider the case where the output tuple {yi,y2,ui) is connected to 
both and 1 i.e. W^{yi,y2,ui\0) and W^{yi,y2,ui\l) are both nonzero. Then since H^(y2|0) / and 
^(y2|l) / 0, it follows that ^2 = -E's- Similarly since W{yi\ui) / and W{yi\ui + 1) / 0, it follows 
that yi = E3. We have already seen that for all ui G Z4, the output tuple {E^,E3,ui) is connected to 
all input symbols. The proof is similar for other three cases i.e. when (yi, y2,ui) is connected to and 
3, when {yi,y2,ui) is connected to 2 and 1, and when {yi,y2,ui) is connected to 2 and 3. 

Next we find all output tuples which are connected to both and 2 but are not connected to 1 or 3. Let 
(yl,y2,^il) be an output tuple such that W+ (7/1, y2,^ii 1 0) / 0,W+{yi,y2,ui\2) / 0, W+{yi,y2,ui\l) = 
0mdW+{yi,y2,ui\3) = 0. 

First assume m G {0,2}. Since W{y2\0) / and W{y2\2) / 0, it follows that y2 G {-EijE's} and 
since W{yi\ui) / and W{yi\ui + 2) / 0, it follows that yi G {Ei,Es}. Note that for yi = E3 and 
y^ = E^, the output tuple is connected to all inputs and therefore all possible cases are yi = Ei,y2 = Ei, 
yi = Ei,y2 = E3 and yi = E3,y2 = Ei. In all cases it can be shown that W^{yi,y2,ui\l) = and 
W^{yi, 1/2, ui\3) = 0. Hence for ui G {0, 2}, (E'l, Ei,ui) is connected to and 2 with probabilities |e^ 
and is not connected to 1 or 3. {Ei^E^^ui) is connected to and 2 with probabilities ^eA and is not 
connected to 1 or 3. (-Es, Ei,ui) is connected to and 2 with probabilities ^eA and is not connected to 
1 or 3. 

Now assume ui G {1, 3}. Same as above we have 1/2 G {-Ei, E^} and since W{yi\ui) / and W{yi\ui + 
2) / 0, it follows that yi G {E2,E^}. In this case, all possible cases are yi = £'2,2/2 = Ei, yi = 
^2,1/2 = £3 and yi = £3,1/2 = £1- In all cases it can be shown that W^{yi,y2,ui\l) = and 
W^{yi, y2,ui\3) = 0. Hence for ui G {1,3}, (£2, £1, ui) is connected to and 2 with probabilities |e^ 
and is not connected to 1 or 3. (£2, £3,^1) is connected to and 2 with probabilities ^eA and is not 
connected to 1 or 3. (£3, £1, ui) is connected to and 2 with probabilities ^eA and is not connected to 
1 or 3. 

Therefore, there are four equivalent outputs connected to and 2 with probabilities |e^ and not connected 
to 1 or 3 and there are eight equivalent outputs connected to and 2 with probabilities ^eA and not 
connected to 1 or 3. Since all of these outputs are equivalent, we can combine them into one output 
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connected to and 2 with probabilities 



4(le2V g/l ^\ =,2^2eA 



4 y \A 

Now we find all output tuples which are connected to both 1 and 3 but are not connected to or 2. Let 
(l/i, 2/2, lii) be an output tuple such that W^+(yl,^/2,^il|l) / 0,W^{yi,y2,ui\3) / 0, W+{yi,y2,ui\0) = 
Oand W+{yi,y2,ui\2) = 0. 

First assume ui G {0, 2}. Since W{y2\l) / and W{y2\3) / 0, it follows that y2 G {E2, E3} and since 
W{yi\ui + 1) / and W{yi\ui + 3) / 0, it follows that yi G {£'2,^3}. Note that for yi = E3 and 
y^ = E^, the output tuple is connected to all inputs and therefore all possible cases are yi = i?2, 2/2 = ^2, 
yi = E2,y2 = E^ and 2/1 = -E'3,2/2 = -£'2- In all cases it can be shown that W^{yi,y2,ui\0) = and 
W~^{yi, 1/2, ^ii|2) = 0. Hence for ui G {0, 2}, {E2, E2, ui) is connected to 1 and 3 with probabilities ^e^ 
and is not connected to or 2. {E2^E^^ui) is connected to 1 and 3 with probabilities ^eA and is not 
connected to or 2. (E'3, E2,ui) is connected to 1 and 3 with probabilities ^eA and is not connected to 
or 2. 

Now assume ui G {1,3}. Same as above we have 7/2 G {-£'2,-^3} and since W{yi\ui + 1) / and 
W{yi\ui + 3) / 0, it follows that yi G {Ei,E^}. In this case, all possible cases are yi = Ei,y2 = E2, 
yi = Ei,y2 = -©3 and yi = £'3,2/2 = -£'2- In all cases it can be shown that VF^(2/i,2/2, ^i|0) = and 
W^{yi, 1/2, ^ii|2) = 0. Hence for ui G {1, 3}, {Ei, E2, ui) is connected to 1 and 3 with probabilities ^e^ 
and is not connected to or 2. {Ei,E3,ui) is connected to 1 and 3 with probabilities ^eA and is not 
connected to or 2. (E'3, ^2,^1) is connected to 1 and 3 with probabilities ^eA and is not connected to 
or 2. 

Therefore, there are four equivalent outputs connected to 1 and 3 with probabilities ^e^ and not connected 
to or 2 and there are eight equivalent outputs connected to 1 and 3 with probabilities ^eA and not 
connected to or 2. Same as above, since all of these outputs are equivalent, we can combine them into 
one output connected to 1 and 3 with probabilities e^ + 2eA. 

We have shown that there is (equivalently) one channel output (call it E^) connected to all inputs 
U2 G Z4 with conditional probability Ai = A^ and we have shown that if a channel output is connected to 
more that one input but is not connected to all inputs, it is either connected to {0, 2} and is not connected 
to {1, 3} (call it E^) or it is connected to {0, 2} and is not connected to {1, 3} (call it -E^). and 2 are 
connected to E^ with probabilities ei = e^ + 2eA and 1 and 3 are connected to E^ with probabilities 
ei = e^ + 2eA. Then for each input U2 G Z4 these exist several outputs which are only connected to U2 
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and not other inputs and whose sum of probabiUties add up to 1— ei— Ai. This completes the proof for W^. 

2) Recursion for W^ : We show that W^ (corresponding to 6i = 0) is equivalent to a channel of the 
same type as W but with different parameters ei and Ai corresponding to e and A respectively; where, 

ei = 2e - (e^ + 2eA) 
Ai = 2A - \l 

Note that each channel output is a pair (^1,^2) £ {0, 1, 2, 3, £"1, ii^2 5 ^3}^- The channel W~ can be 
shown to be as following: 

Output pairs (0,0), (1, 1), (2,2), (3,3) are only connected to input each with conditional probability 
|(1 — e — A)^. This is equivalent to one channel output only connected to with probability (1 — e — A)^. 
Output pairs (0,2), (1,3), (2,0), (3, 1) are only connected to input 2 each with conditional probability 
|(1 — e — A)^. This is equivalent to one channel output only connected to 2 with probability (1 — e — A)^. 
Output pairs (0,3), (1,0), (2, 1), (3,2) are only connected to input 1 each with conditional probability 
^(1 — e — A)^. This is equivalent to one channel output only connected to 1 with probability (1 — e — A)^. 
Output pairs (0,1), (1,2), (2,3), (3,0) are only connected to input 3 each with conditional probability 
|(1 — e — A)^. This is equivalent to one channel output only connected to 3 with probability (1 — e — A)^. 
Output pairs (0,£i), (1,^2), (2,£i), (3,^2), (^1,0), (£i,2), (^2,1), (-^2,3) are only connected to 
inputs and 2 each with conditional probability |e(l — e — A). Output pairs {Ei, Ei), {E2, E2) are only 
connected to inputs and 2 each with conditional probability ^e^. This is equivalent to one channel 
output only connected to and 2 with probability 

ei = 8x -e(l-e-A) + 2x -e^ 

= 2e - (e^ + 2eA) 

Output pairs (0,^2), (l,£i), (2,^2), (3,£i), (£i,l), (^1,3), (^2,0), (-^2,2) are only connected to 
inputs 1 and 3 each with conditional probability je(l — e — A). Output pairs (i?i, £'2), (£2, Ei) are only 
connected to inputs 1 and 3 each with conditional probability ^e^. This is equivalent to one channel 
output only connected to 1 and 3 with probability 2e — (e^ + 2eA) . 

Output pairs (0, £3), (1, £3), (2, £3), (3, £3), (£3, 0), (£3, 1), (£3, 2), (£3, 3) are connected to all inputs 
each with conditional probability |A(1 — e — A). Output pairs (EijE^), (£2, £3), {E3,Ei), (£3, £2) 
are connected to all inputs each with conditional probability ^eA. Output pair {E^^E^) is connected to 
all inputs with conditional probability A^. This is equivalent to one channel output only connected to all 
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inputs with probability 

ei = 8 X -A(l - e - A) + 4 X -eA + A^ 
= 2A - A^ 
We have Usted all 49 channel outputs and the corresponding probabilities. This completes the proof for 

W-. 



C. Upper Bound on Z{W) 

Assume Z^' {W) < e. This implies 

for all d G H\M. Therefore for each x £ G, 



Y, ^Jw{y\x)W{y\x + d) < qe (20) 

y&y 



The Bhattacharyya parameter of the channel W is given by: 
Z{W) = — ^ Y. E ^^(y\*H + tM + M)W{y\tH + t'j^ + M) 



^ ' tM,t'^i&TMyey 



' ' E E 






Y W{y\tH + tM + m)\ I Y W{y\tH + t'j^ + m') 

m€M / \m'eM / 



Y 12^| Yl Wiy\tH + tM + m)W{y\tH + t'M + m') 



q(q-l)\M\ 

^ ^ ' ' tM,t'M<^TMyey V m,m'eM 



f'Mt^t'M 



1 1 



-^f^^ITM T. Y. Y. ^JW{y\tH + tM + ^)W{y\tH + t'^ + m') 



Let X = tu + tu + "m and x' = t// + t'^ + m! . Note that x — x' = tjv/ — ^m + "^ ~ "*-' ^ -^ since 
tM,t'j^,'m, m' € i/. Also note that since tjvf / ^m ^i^^ m — m' £ M, it follows that x — x' ^ M. Now 
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we use dlOll to conclude: 



^ '' ' tM,t'^ieTMm,m'eM 

< (- — -) \MrQe= - — — — — — ^e 

Remark A.l. For an arbitrary Abelian group G, let H < G be an arbitrary subgroup and let M be 
any maximal subgroup of H. If for all d G H\M, Z^{W) < e then with a similar argument as above 
we can show that Z{W) < 0{e) where W is defined by ( |11[ ). 

D. Lower Bound on Zd'-t-tH+M{W) 
Assume Z^' {W) > 1 - e. Define 

First we sliow that Zci'{W) > 1 — e implies Dd'iW) < 0(e). Define the following quantities: 

_ W{y\x) + W{y\x + d') 

5.,y = \\W{y\x)-W{y\x + d')\ 



2 



Then we have 



Zd'{W) = - E E V (^^.y ~ ^x,y){qx,y + ^x,:. 



Also we have 



and 



Note that 



^ xeGyey 

^ 2^ Z_^ \/^x,y ~ °x,y 

^ xeGyey 



^ xeGyey 



U J; '^Xjy _; Qx,y 



Z.{W)< max _^\Y.Y.\l<ily-'^ly 



"a:,!/- „ Z^ajGG ^sEy d^y—L) (/ 



xeGyeD^ 
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The Lagrangian for this optimization problem is given by 



we have 



^ xeGyey \^ xeGyey J 



d „ _ dx^y A 



■,y x,y 
and 



2 fl2 

C = ^^ ^ < 



x^y \'dx,y "'x,ijJ 



Define 7 = -^ to get dx,y = ^^ j+^l'^^v ^^ ^^^^ ^yey l^^v = 1' therefore, 

,,2'i^^y 



1' 



Y. Y. dx,y - E E Y 1 + ^. 



1 + ^2 E E l-'V 

' ^xeGyey 



7^ 



1 + 7^ 



Therefore we have D = \ jh^ and hence d^^y = Dq^^y For this choice of d^^y we have 



^aYYl y'^ly ~ dl,y - -^ Y Y ' 



^ xdGyay ^ xeGyey 



Therefore, we have shown that Za^W) < ^/l-Dd'iW)^. This impUes that Dd'iW) < 2e - e^ = 0(e). 

Next, we show that Dii'{W) < e impUes Zd'{W) > 1 — 0(e). We need the following lemma: 

Lemma A.l. For constants < a < 6 < 1, with b — a < 5, 

a + b 5 



ab> 

- 2 2 

Proof: Note that 

a + b i—r a + X I — 
V ab < max v/ax 

2 0<a;-a<<5 2 
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We have 



d_ 
dx 



a + X 



1 a 



2 2\/ax 



> 



for all X > a. Therefore the maximum is attained at x = a + 5. Therefore, 

a + b r-r . a^ [a^ S) 



'ah< 



2 - 2 

The maximum of the right hand side is attained at a = 0, hence, 



\Ja[a + (5) 



a + 6 



'ah< 



Assume D^, (yV) < e. We have 

1 - Zd'{W) = ^--Y.Y. ^W{y\x)W{y\x + d') 



xeGyey 

1 ^^^^ fW(y\x) + W(y\x + d') , ^, , ,„^, , - 

-Y.Y.[^^ w^ '--^/W{y\x)W{y\x + d' 



a ^ — ' ^ — ' V 



(«) 1 



xi^Gyey 

Dd'{W) 



where (a) follows from Lemma A.l with a = W{y\x), b = W{y\x+d') and 5 = \W{y\x) — W{y\x + d') 
This shows that Dd'{W) < e implies Zd'{W) > 1 - e. 



Next, we show that Dd'(W) < e implies Dd'+t„+M(W) < 0(e). We have 

Dd^+t^+MiW) = i- J2 J2\^^y\^H + *M + M)-W{y\tH + tM + d' + M) 
^ tMi^TMy^y 



1 1 

1 1 



tM&TMyey 



Y^ Wiy\tH + tM + m)- Y^ W{y\tH + tM + d' + m) 



meM 



rriGM 



-~\W\ ^ Zl Zl l^(2/|i// + tAf + m)-W(y|t^ + tM + rf' + 

Im&Tm y&y meM 

< ^y^2qDd'{W) 
- q\M\ ^ ^ ^ 

This shows that Dd'{W) < e implies Dd'+t„+M{W) < p^ = 0{e). 



m] 
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We have shown that Zd'{W) > 1-e impUes Dd'{W) < 2e-e^ = 0(e). This impUes Dd'+tH+M{W) < 
Igp = 0(e) and this in turn implies Zd'+t^+M{W) > 1 - ^^^^ = 1 - 0(e). 



Remark A. 2. For an arbitrary Abelian group G, let H < G be an arbitrary subgroup and let M be 
any maximal subgroup of H. If for some d G H\M, Zj{W) > 1 — e then with a similar argument as 



above, we can show that Z^^ _i_jv/(^) > 1 — 0(e) where W is defined by (111. 



E. Alternate Proof for a Lower Bound on Zci'+t„+M{W) 

In Appendix [p} we proved that Zd'{W) > 1 — e implies Zd'+tH+M{W) > 1 — 0(e). In this part, we 
give an alternate proof of this statement for the Zp.- case. 
Assume Zd'{W) > 1 - e. It follows that 



E 



y&y 



Similar to the previous case, we have for all x € G, 



< qe 



'1 - Z{W{,^,+2d'}) < 2^e 
Repeated application of the above lemma yields \/x, x' € G : x — x' £ {d'), 



^'l-Z{W{,^,,y)<q^e (21) 

We have 

Zd'+t^+MiW) = - Y. Yl \/^(y\^H + tM + M)W{y\tH + tM + d' + M) 



Im&Tm y&y 



= \ Y. YJ Y1 T^W{y\tH + tM + m)W{y\tH + tM + d' + m') 

Im&Tm y&y V m,m'GM ' ' 

> -- Y. Y. Y. n^ V^(yl*i^ + *M + m)W{y\tH + tM + d' + m!) 

> - V mill V ^W{y\tH + tM + m)W{y\tH + tM + d' + m') 

a ^ — ' m,,m'&M ^ — ' 

tM&TM yey 

where (a) follows since -^ is a concave function. Let x = tH + tM + m and x' = tn + tM + d' + m! . 

It follows that x' — X = d' + {m' — m). Since d' ,7n' ,7n £ H we have x' — x £ H. Since G and hence 






H are Xp^ rings it follows that d' G H\M generates H\ hence x' — x G {d'). We can use (2]_i to get 

^e 



Zd'+t„+M{W)>- Y "^i^,,(l-'?'^) = l . 



It follows that Zd-+tH+M{W) > 1 - 0(e). 
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F. The Rate of Polarization 

Recall that for t = 0, • • • , r, (Z*)(") = Y^d^H, Zd{w\^"^) where J„ is uniform over {1, 2, • • • , 2"}. 
For t = 0, • • • ,r, define (^max)*^"^ = "lax^^^ji^^ Z(^(W^" ) where J„ is same as above. Since for all 
d G G, Zd(Ty+) = Zd{Wf it follows that Z^^^(Ty+) < Zl^^^{Wf. It has been shown in [2. p. 6] that 

Zd{W-)<2Zd{W)+ Y, Z^{W)Zd+A{W) 

Note that for any A (^ G, d ^ Ht implies that either A ^ Ht or d + A ^ Ht. Therefore, d ^ Ht implies 
either Za{W) < ^Lx(W^) or Zd+A{W) < Z^^^^iW) (or both). Since Z^iW) and Zd+A{W) both take 
values from [0, 1], it follows that 

ZA{W)Zd+A{W) < Zl,^{W) 

Therefore, for any d ^ Ht, ZdiW') < 2Zd(W) + qZ^^^^{W). Hence 

ZL.iW-)=mj,^ZdiW-) 

<max(2Z,(Ty) + gZLxW) 

d^Ht 

Since for all d Z^ converges to a Bernoulli random variable it follows that (-^max)*^'*'' ^^^^ converges to a 
{0, l}-valued random variable (Z^a^)(°°). Note that P ((^Lx)^'^-' = O) = P ((Z*)°^ = O) = ELti^s- 
Therefore, (^max)''"^ satisfies the conditions of |ll[ Theorem 1] and hence 

J|m P ((ZLx)^") < 2-2'") = P ((^Lx)^°^^ = O) 

for any /? < i. It clearly follows that lim„_^oo P (g(^max)^"^ < 2-^'^") = P ((^Lx)^°^'' = O)- Note 
that the event {(Z*)(") < 2-^''"} includes the event {g(^Lx)^"'' < 2-2''"}. Therefore, 



lim pf(Z*)(") <2-2''"') >P((Z 



^tNCX) 



Similarly, for an arbitrary Abelian group G and a subgroup H of G, define (^max)^"^ = rnax^^j:^ Zd{Wj^ " ) 



where J„ is defined as above. It is straightforward to show that (-^max)^"^ satisfies the conditions of |11 
Theorem 1]. Therefore, with an argument similar to above, we can show that, 

lim P f (Z^)(") < 2-2^") > P ((Z^)°° = 0) 

for any /3 < i. 
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